Review Article 

http://dx.doi.org/10.3947/ic.2013.45. 3.253 
Infect Chemother 2013;45(3):253-259 
pISSN 2093-2340 • elSSN 2092-6448 



Host Genomics in Infectious Diseases 

Mark Loeb 

Departments of Pathology and Molecular Medicine, Clinical Epidemiology and Biostatistics, and Michael G. DeCrootelnstitute for 
Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada 



Understanding mechanisms by which genetic variants predispose to complications of infectious diseases can lead to important 
benefits including the development of biomarkers to prioritize vaccination or prophylactic therapy. Family studies, candidate 
genes in animal models, and the absence of well-defined risks where the complications are rare all can point to genetic pre- 
disposition. The most common approach to assessing genetic risk is to conduct an association study, which is a case control 
study using either a candidate gene approach or a genome wide approach. Although candidate gene variants may focus on po- 
tentially causal variants, because other variants across the genome are not tested these studies frequently cannot be replicated. 
Genome wide association studies need a sizable sample and usually do not identify causal variants but variants which may be 
in linkage disequilibrium to the actual causal variant. There are many pitfalls that can lead to bias in such studies, including 
misclassification of cases and controls, use of improper phenotypes, and genotyping errors. These studies have been limited to 
common genes and rare variants may not be detected. As the use of next generation sequencing becomes more common, it can 
be anticipated that more variants will be confirmed. The purpose of this review article is to address the issue of genomics in in- 
fectious diseases with an emphasis on the host. Although there are a plentitude of studies that focus on the molecular character- 
istics of pathogens, there are far fewer studies that address the role of human genetics in the predisposition to infection or more 
commonly its complications. This paper will review both the approaches used to study host genetics in humans and the pitfalls 
associated with some of these methods. The focus will be on human disease and therefore discussion of the use of animal mod- 
els will be limited to those where there are genes that have been replicated in humans. The paper will focus on common ge- 
netic variants that account for complex traits such as infectious diseases using examples from flaviviruses. 
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Introduction genomics in infectious diseases with an emphasis on the host. 

Although there are a plentitude of studies that focus on the 
The purpose of this review article is to address the issue of molecular characteristics of pathogens, there are far fewer 
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studies that address the role of human genetics in the predis- 
position to infection or more commonly its complications. 
This paper will review both the approaches used to study host 
genetics in humans and the pitfalls associated with some of 
these methods. The focus will be on human disease and there- 
fore discussion of the use of animal models will be limited to 
those where there are genes that have been replicated in hu- 
mans. The paper will focus on common genetic variants that 
account for complex traits such as infectious diseases using 
examples from flaviviruses. 

Why study genetic determinants for infectious 
diseases? 

The first question to address is what are the potential bene- 
fits of studying genetic variants in humans. While it is true that 
the virulence of pathogens such as bacteria and viruses often 
has a molecular basis and this to a limited extent has been ad- 
dressed as being capable to increasing complication rates in 
humans, less is known about the human host as derived from 
human genetic studies. For example, while there have been 
numerous studies to suggest that the H5N1 influenza virus in- 
creases complication rates in humans, much less is known 
about genetic variants that predispose humans to complica- 
tions of influenza [1-3]. There are good arguments to support 
determining genetic variants associated with infectious dis- 
eases. The first is that doing so will lead to a better under- 
standing about the mechanism of disease. As discussed be- 
low, this is not an easy goal to attain even if significant genetic 
variant is confirmed to be associated with disease. This is be- 
cause the marker itself may not be causal but may be linked to 
another maker that has yet to be determined. However, if a 
causal variant is eventually discovered, this could lead to func- 
tional work to better understand the precise mechanism. 

From a practical standpoint, knowledge about genetic vari- 
ants associated with the risk of infectious complications could 
lead to the development of biomarkers to predict which indi- 
viduals will be at high risk for complications of disease. This it- 
self however is not a trivial task, developing biomarkers can 
take years as both analytic and clinical sensitivity and speci- 
ficity need to determined and validated. However, it is an area 
that is lacking in the field of infectious diseases. Having accu- 
rate biomarkers could help target individuals for preferential 
vaccination or prophylactic therapy. 



What are clues for genetic susceptibility to 
infection? 

For certain illnesses, family histories give clues to genetic 
susceptibility. For infectious diseases, exposure to infection in 
families is common so it can be more difficult to separate ex- 
posure from actual susceptibility to infection. This is the situa- 
tion with tuberculosis where it is difficult to separate risk for 
tuberculous infection from exposure compared to genetic 
role. However, using disease due to tuberculous and compar- 
ing this to infection would help define genetic predisposition. 
A study of mortality among adoptees demonstrated a nearly 
six fold increase in risk of an infectious disease cause of death 
in those where one of their biological parent died of an infec- 
tious disease before the age of 50 years [4]. Twin studies have 
been an important source of knowledge for genetic predispo- 
sition. Although there are not many examples of such studies, 
influenza is a notable exception. That is, using genealogy data 
from Utah, it was possible to estimate a greater complication 
rate in twins due to influenza [5]. 

One of the most important clues for possible genetic suscep- 
tibility to infections, is the paucity of well-defined risk factors. 
Serious clinical illness due to West Nile virus emerged rather 
dramatically in North America beginning with a large out- 
break in New York City in 1999. Since that time West Nile virus 
(WNV) emerged as an important human pathogen in North 
America where it eventually became reported in a majority of 
states and provinces in the U.S. and Canada [6]. Although the 
incidence of reported cases has changed from year to year , 
the severe complications that can occur in infected cases re- 
main a concern. For example, for West Nile virus infection 
only 1 in -150 individuals who are infected develop severe ill- 
ness such as meningitis or encephalitis [7]. This is suggestive 
for a genetic susceptibility in humans. Although the incidence 
of severe neurological syndromes increases with age and with 
immunosuppression, there is an absence of other well-de- 
fined risk factors, again suggesting that here is an underlying 
genetic predisposition to complications of disease. For den- 
gue, the situation is similar. Dengue virus (DENV) is found in 
tropical and sub-tropical regions around the world, predomi- 
nantly in urban and semi-urban areas [8], The public health 
burden is huge; it is estimated that 250 million people or two 
fifths of the world's population are at risk from this virus. The 
World Health Organization currently estimates that there may 
be 50 million cases of DENV infection worldwide annually [9] . 
Although the vast majority of dengue virus infections result in 
no symptoms or a mild febrile illness, approximately 500,000 
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dengue cases progress to life-threatening disease causing 
20,000 to 25,000 deaths annually. The clinical presentation 
may include fever and an influenza-like syndrome character- 
ized by headache, retro-ocular, and joint pain, rash, and 
lymphadenopathy; known as classic dengue or "break-bone 
fever" [10, 11]. Following the febrile phase, the disease may 
progress to dengue hemorrhagic fever (DHF) characterized 
by thrombocytopenia and pleural and abdominal effusions 
and dengue shock syndrome (DSS) (DHF with evidence of 
systemic hypoperfusion). 

Less than 2% of individuals infected with dengue develop 
dengue hemorrhagic fever (DHF). This suggests that host ge- 
netic factors may play an important role. Indeed, there is an 
absence of DHF in the Haitian population despite hyper-en- 
demic transmission of dengue virus serotypes. Similarly, mul- 
tiple dengue virus serotypes circulate in West Africa, but there 
have been no reports of DHF. Furthermore, blacks were less 
likely to be hospitalized during Cuban dengue virus epidem- 
ics. Although pre-existing immunity may be a confounding 
factor, these reports suggest that genetic predisposition is an 
important factor as well. 

Evidence from animal models can suggest human genetic 
loci that predispose to infectious disease. Unlike dengue, 
where there is no animal model, there have been experiments 
in mice to try to establish a locus for susceptibility for West 
Nile virus. Innate resistance to flavivirus-induced morbidity 
and mortality was first demonstrated in mice in the 1920s and 
showed monogenic autosomal dominant inheritance. These 
mice are susceptible to infections with other viruses but are 
resistant to all flaviviruses. Furthermore, within the mouse ge- 
nus Mus, susceptibility to West Nile virus experimental infec- 
tion is completely correlated with the occurrence of a point 
mutation resulting in the truncation of the 2'-5'-oligoadenyl- 
ate synthetase (2'-5'-OAS) LI isoform. This would suggest that 
this enzyme is relevant in WN pathogenesis through an effect 
restricting viral replication in target tissues [12]. Indeed, the 
cluster of genes encoding 2'-5'-oligoadenylate synthetases (2'- 
5'-OAS) has for many years been seen as a prominent candi- 
date locus since they encode a multimember family of IFN-in- 
ducible proteins known to play an important role in the 
established endogenous antiviral pathway. 

What are the most common approaches to study 
genetic variants for infectious diseases? 

Although family transmission studies can be conducted as 



well as analysis of rare immune deficiency syndromes [13-15], 
the most common way to genetic variants that predispose to 
infectious diseases is by association studies. The approach is 
to compare the frequency of alleles in cases to controls. That 
is, a case can be defined as a patient who developed a compli- 
cation to infection while the control would be someone who 
might evidence of infection (e.g. serological evidence) but did 
not develop complications. Such an approach would ensure 
that both cases and controls were exposed in a similar man- 
ner to the pathogen and the difference in outcome could be 
inferred to be due to genetic variant in the host assuming the 
pathogens in cases and controls were similar. 

There are two broad types of association studies, candidate 
gene studies or whole genome association studies (GWAS). 
Candidate gene studies have an underlying specific hypothe- 
sis that a particular variant or variants are causal, that is that 
typically, a variant (i.e a single nucleotide polymorphism or 
SNP) at such a locus leads to an amino acid change that ulti- 
mately predisposes the patient to disease. Variants in vitamin 
D receptor provide a good example. The vitamin D receptor 
mediates the immunoregulatory effects of 1,25-dihydroxyvita- 
min D 3 (1,25 D 3 ), which activates monocytes, stimulating cel- 
lular immune responses and suppressing immunoglobulin 
production and lymphocyte proliferation. The C allele of a 
SNP at position 352 of the VDR gene has been associated with 
tuberculoid leprosy, clearance of hepatitis B infection, and re- 
sistance to pulmonary tuberculosis [16, 17]. In a study con- 
ducted in Vietnam, 327 children admitted to hospital in Viet- 
nam with dengue shock syndrome were compared to 251 
ethnically matched healthy controls [18]. Frequency of the 
variant VDR.I352 was 2% in cases compared to 3% in controls 
suggesting a possible protective effect (OR: 0.48, 95% CI: 0.21 
to 1.09, P = 0.056). It is notable that this study did not have an 
adequate sample size as is the case with many genetic studies. 
Such studies are best considered as hypothesis generating 
and provide preliminary data to support larger studies. 

In contrast to candidate gene studies, a GWAS study is based 
on a genome wide screen for variants that typically will locate 
a variant which itself may not be causal but may be linked (i.e. 
in linkage disequilbirum) to the causal variant. These studies 
are based on the common disease-common variant model 
which is based on comparisons of common alleles (> 5%). Al- 
though candidate gene studies used to be more common, as 
the cost of chips declined, the number of GWAS studies with 
large sample sizes has increased. One recent example is a 
GWAS study of 2,008 children with DSS and 2,018 controls 
from Vietnam, a susceptibility locus at MICB (major histo- 
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compatibility complex (MHC) class I polypeptide-related se- 
quence B) was identified [19]. It was within the broad MHC 
region on chromosome 6 but outside the class I and class II 
HLAloci (rs3132468, P mea = 4.41 x 10 n , per-allele odds ratio 
(OR) = 1.34 (95% confidence interval: 1.23-1.46) [20]. Another 
locus identified was within PLCEl (phospholipase C, epsilon 
1) (rs3765524, P meta = 3.08 x 10 10 , per-allele OR = 0.80 (95% 
confidence interval: 0.75-0.86). This study is typical of many 
GWAS studies because the genetic variants that were found to 
be associated with disease were completely unexpected and 
certainly not a hypothesis that the investigators were testing. 
It is also typical in that these variants are likely not causal 
themselves but perhaps linked to causal variants. 

What are possible pitfalls of association studies? 

One of the most important considerations is to choose ap- 
propriate cases and controls. This is a similar principal to oth- 
er types of case control studies. As mentioned above, exposure 
to an infectious agent can confound the relationship between 
variants and disease. Therefore, selecting individuals with mi- 
crobiologic evidence of exposure to infection is often worth- 
while. While it can be argued that there may not be a differ- 
ence between exposed individuals who do not become ill and 
non-exposed individuals, there is always a possibility for ge- 
netic factors to predispose to infection (as opposed to compli- 
cations) and this would argue for the choice of exposed indi- 
viduals as controls in order to focus genetic variants 
associated with complications. For example, a recent GWAS 
study for West Nile virus compared cases with neuroinvasive 
disease to controls who were symptomatic (i.e. had West Nile 
fever) with serologic evidence of infection but who did not 
have complications [20]. It is also extremely important to care- 
fully define the phenotype for cases and for the controls. There 
is empiric evidence that misclassification of the phenotype 
can have a profound effect on the results [21]. Defining the 
phenotype well is also important because it can have an im- 
pact on replication studies. That is, using a different pheno- 
type in a replication study may lead to lack of replication of a 
particular variant because the original definition was not used 
and it may have biological implications [22]. While assessing 
the effect of a variant across various ethnic groups is a laud- 
able goal this may lead to bias if the structure of the popula- 
tion is not taken into account. This is because of alleles at a 
particular locus can change from population to population 
and if it not balanced cases and controls lead to spurious as- 



sociations due to allelic differences i.e the relationship be- 
tween the disease and the candidate gene is confounded by 
the ancestry of the population [23]. So in this case a particular 
disease may be more common in a population with a high fre- 
quency of a particular allele but the disease is not due to the 
allele but due to another factor. Approaches to guard against 
such population stratification is to either study one particular 
ancestry or to using principal components analysis to adjust 
for the population structure. 

Having excellent quality control procedures for genotyping 
is imperative. Quality checks based on the sex and ethnicity of 
the phenotype compared to what has been genotyped. It is 
also important to have a high call rate (correctly attributing a 
SNP that has been genotyped to the appropriate genotype). 
Sometimes unbeknownst to the investigator, two or more in- 
dividuals in the cohort being tested may be related (known as 
cryptic relatedness). Since the methods for analysis are based 
on unrelated or independent individuals only one of the relat- 
ed individuals can be kept in the analysis. Genotyping error 
can also occur when the distribution of cases and controls on 
plates is not at random. For example, when all cases are geno- 
typed separately from controls differential failure of SNPs in 
cases compared to controls may create an imbalance that can 
lead to biased results. 

As mentioned above, sample size is often a limitation in in- 
fectious diseases genetic association studies. Unlike condi- 
tions that are highly prevalent in many populations, such as 
diabetes or high cholesterol, it can very challenging and ex- 
pensive to create large cohorts of individuals with complica- 
tions of infectious diseases. Because of this, it can be difficult 
to obtain adequate sample sizes. Obtaining an adequate sam- 
ple size is key because in a GWAS the P value for significance 
is 5 x 10 8 , in order to account for the high number of SNPs be- 
ing compared which typically is over 500,000 to 2 million 
SNPs. Generally speaking, investigators should aim to have at 
least 1,000 cases and 1,000 controls in the first phase of a 
GWAS. 

Examples from the flavivirus literature 

Some challenges mentioned above are evident in examples 
of association studies using flaviviruses. For West Nile virus, 
one candidate gene studied 33 West Nile virus infected indi- 
viduals who had developed either fever, meningitis, or en- 
cephalitis [24]. They were compared to 60 healthy controls 
from an available database and therefore were unlikely to 
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have been exposed to the virus. The study identified one syn- 
onymous SNP in OASL exon 2 as being at significantly higher 
frequency in the cases compared to the controls (P< 0.004). 
However no adjustment was made for multiple testing and 
there was no attempt at replication. An important principle in 
genetic studies is that results, either of candidate gene studies 
or association studies, should be replicated in an independent 
population. The P value needs to be adjusted for multiple test- 
ing because of the large number of comparisons being made. 

More recently, another candidate gene study examining 
OAS gene cluster alone again suggested a predisposition to 
WNV infection [25]. This study compared OAS variants in a 
cohort of symptomatic individuals (fever, meningitis, enceph- 
alitis) and asymptomatic but infected individuals identified 
through a blood donor bank to a non-infected cohort. There 
were no genetic variants associated with severity of disease, in 
keeping with a larger association study recently reported [20]. 
However, the authors reported that a SNP (rsl0774671) was 
significantly more frequent in West Nile virus infected than 
non-infected individuals {P = 0.0002). Limitations include the 
fact that there was no replication in a separate cohort, al- 
though this SNP did have an effect on viral replication in an 
ex-vivo model of primary human lymphoid tissue [25] . 

One study compared symptomatic to asymptomatic indi- 
viduals with WNV infection and found that SNPs in the inter- 
feron pathway (IRF3 and MX1) were associated with symp- 
toms [26]. In contrast to other studies that examined 
complications, they also found that OAS1 was associated with 
an increased risk for encephalitis and paralysis. An associa- 
tion between symptomatic WNV disease and homozygosity 
for the CCR5A32 mutation in the chemokine receptor gene 
CCR5 was initially reported [27, 28] .However, this association 
was not replicated but was suggestive of a link to clinical man- 
ifestations of infection with CC/?5A32 mutation [29]. More- 
over, in a recent large association study (560 neuroinvasive 
cases and 950 controls and a replication cohort of 264 cases 
and 296 controls) no such evidence for an effect was noted 
[20]. One possible reason for the discrepancy is that the latter 
study compared cases to controls all of whom were symptom- 
atic when infected with WNV. In contrast, the published re- 
ports compared cases to controls with no symptoms. 

Future directions for GWAS in the field of 
infectious diseases 

To date, although there have been many important associa- 



tion studies that have led to biological insight in the field of in- 
fectious diseases, the vast majority of results, similar to those 
in other fields, have been relatively low effect sizes (odds ratio 
< 2). Such results do not provide an optimal basis for screen- 
ing since the risk of disease complications may already be 
quite low (as with flaviviruses < 1%) and a low odds ratio will 
not lead to a substantial increase in absolute risk. However, 
with the advent of next-generation sequencing, we may be 
able to anticipate the discovery of more rare alleles that may 
have a greater effect size and thus help bridge the gap between 
biological discovery and progress in clinical medicine [30]. 
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